KAFKA-19905: Fix tight reconnection loop during shutdown #20950

fvaleri · 2025-11-21T15:04:07Z

This patch fixes a tight broker to controller reconnection loop that may happen during shutdown.

Node 1 and 2 (brokers) request controlled shutdown
Controller grants the shutdown
Controller itself shuts down (RaftManager shutdown)
Node 1 and 2 continue trying to heartbeat to the now-dead controller
They get stuck in this reconnection loop because the NodeToControllerRequestThread is still running and hasn't been shut down properly

The reconnection loop goes on for exactly 5 minutes, which is the shutdown timeout hard coded in KafkaBroker trait.

This is what I have from another test logs for one of the brokers:

SIGTERM received: 14:39:46,282
Actual shutdown completed: 14:44:46,385
Time elapsed: 5 minutes and 0.103 seconds (approximately 5 minutes)

I acknowledge that this is unlikely to happen with brokers running on different machines, but not so unlikely when running tests locally on a single physical machine.

This patch fixes a tight broker to controller reconnection loop that may happen during shutdown. 1. Node 1 and 2 (brokers) request controlled shutdown 2. Controller grants the shutdown 3. Controller itself shuts down (RaftManager shutdown) 4. Node 1 and 2 continue trying to heartbeat to the now-dead controller 5. They get stuck in this reconnection loop because the NodeToControllerRequestThread is still running and hasn't been shut down properly The reconnection loop goes on for exactly 5 minutes, which is the shutdown timeout hard coded in KafkaBroker trait. This is what I have from another test logs for one of the brokers: SIGTERM received: 14:39:46,282 Actual shutdown completed: 14:44:46,385 Time elapsed: 5 minutes and 0.103 seconds (approximately 5 minutes) I acknowledge that this is unlikely to happen with brokers running on different machine, but not so unlikely when running tests locally on a single physical machine. Signed-off-by: Federico Valeri <[email protected]>

github-actions bot added triage PRs from the community core Kafka Broker small Small PRs labels Nov 21, 2025

fvaleri force-pushed the fix-shutdown-loop branch from 163c3f1 to 93b57cf Compare November 21, 2025 15:33

fvaleri marked this pull request as draft November 21, 2025 15:36

fvaleri force-pushed the fix-shutdown-loop branch from 93b57cf to c00e3d1 Compare November 25, 2025 15:38

fvaleri force-pushed the fix-shutdown-loop branch from c00e3d1 to 5bae582 Compare November 25, 2025 15:39

fvaleri closed this Nov 25, 2025

fvaleri deleted the fix-shutdown-loop branch November 25, 2025 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KAFKA-19905: Fix tight reconnection loop during shutdown #20950

KAFKA-19905: Fix tight reconnection loop during shutdown #20950

fvaleri commented Nov 21, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

KAFKA-19905: Fix tight reconnection loop during shutdown #20950

KAFKA-19905: Fix tight reconnection loop during shutdown #20950

Conversation

fvaleri commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fvaleri commented Nov 21, 2025 •

edited

Loading